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APHASIA THERAPY SYSTEM 

5 ^ BACKGROUND THE OF INVENTION 

This invention relates to speech therapy. More particularly, this invention relates to a 
computer-based system that can be used to provide speech therapy to a person such as an 
aphasia patient. Still more particularly, this invention relates to a computer-based system 
that can be operated by an aphasia patient or other user in order to provide speech therapy 

10 that is self paced and does not require the participation of a human speech therapist. 

Aphasia is a language disorder caused by stroke or other injury to the brain; some 
form of aphasia afflicts over two million Americans. Aphasia interferes with the ability to 
select words and assemble the selected words in accordance with syntax rules to form 
sentences communicating an intended meaning. Speech therapy to improve the capabilities 

15 of aphasic patients has traditionally been provided by human speech therapists who, in the 
course of a therapy session, instruct the patient as to linguistic tasks to be performed and 
evaluate the patient's performance of the tasks. While such therapy can be effective, there 
are several drawbacks in the use of humans to supervise and monitor patients practising to 
regain speech capabilities. Use of a trained speech therapist is expensive and patient access 

20 to such therapists may be limited by economic considerations or by the scarcity of therapists. 
Therapy sessions generally must be conducted when time with a therapist can be scheduled, 
rather than on an ad hoc basis when a patient desires therapy. Moreover, a therapist 
appearing to wait impatiently while an aphasic patient struggles unsuccessfully to find a 
word may make the patient feel uncomfortable with the therapy and inhibit progress. 

25 Computer technology has been employed to assist aphasic patients. However, known 

computer-based systems do not provide an aphasic patient with speech therapy of the sort 
conducted by human therapists. 

SUMMARY OF THE INVENTION 
It is therefore a general object of the invention to provide a method for assisting 

30 aphasic patients in improving their speech that does not require a human therapist. It is a 
further object of the invention to provide computer-based system implementing such speech 
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therapy method. In accordance with the invention, a computer-operated system is provided 
that includes speech input, speech recognition and natural language understanding, and audio 
and visual outputs to enable an aphasic patient to conduct self-paced speech therapy 
autonomously. The system of the invention conducts a therapy exercise by displaying a 
picture; generating a speech prompt asking the patient for information about the picture; 
receiving the patient f s speech response and processing it to determine its semantic content; 
determining whether the patient's response was correct; and outputting feedback to the 
patient. Preferably the system includes a touch screen as a graphical input/output device by 
which the patient controls the therapy exercise. Preferably the system of the invention 
conducts such therapy exercises in a variety of grammatical structures with which an aphasic 
patient may need retraining. These and other objects and features of the invention will be 
understood with reference to the following specification and claims, and the drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram illustrating the functional elements of a preferred system in 
accordance with the present invention. 

Fig. 2 is a flow diagram illustrating the basic operation of a speech therapy system in 
accordance with the present invention. 

Fig. 3 is a more detailed flow diagram illustrating the operation of a preferred speech 
therapy system in accordance with the present invention. 

Figs. 4 and 5 are illustrations of visual displays that may be generated by a speech 
therapy system in accordance with the present invention. 

DETAILED DESCRIPTION 

The present invention is preferably implemented as a multimedia computer system 
including audio input and output and a visual display, the computer system operating in 
accordance with stored instructions to effect the method of the present invention and provide 
a speech therapy system in accordance with the present invention. The preferred system 
incorporates speech recognition, a natural language understanding system, a touch screen, 
and pictures in a series of exercises designed to retrain specific grammatical structures. The 
system is preferably able to to conduct exercises that progress from simple active sentences 
("the fireman carries the ballerina") to prepositional phrases ("the bird is behind the glass"), 
notoriously difficult for most aphasic patients, and to more complex structures incorporating 
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prepositional phrases ("the man throws the book from the tower"). Patients view pictures 
and describe the events depicted in these pictures by speaking into microphone. Since all 
aphasics experience word-finding difficulties, a touch screen is desirably incorporated into 
the system; with a touchscreen patients can find out the name of a pictured item by touching 
it, and cues for verbs and prepositions may be made available via icons. The speech 
recognizer and the natural language understanding system interpret the patients' utterances, 
giving feedback (preferably both musical and visual) about the correctness of their picture 
descriptions. There are several reasons why spoken natural language understanding is so 
desirable in a therapy system for aphasic patients. For one thing, it allows aphasic patients to 
practice language in the most important modality, speech. Also, many aphasics have severe 
difficulty with reading and writing, and simply could not use a computer-based therapy 
system if it required these skills. In addition, natural language processing capability is 
necesary in order to give patients feedback about whether they are putting words together 
into sentences correctly. A system with a speech recognizer alone, without natural language 
understanding, could detect whether the patient has produced a single word or string, but in 
order to assess the correctness of a spoken sentence it is necessary to analyse its grammatical 
structure and compare its meaning with the picture on display, since the same idea can be 
expressed in different ways. Against the foregoing background, preferred embodiments of 
the invention as depicted in the drawings will be described. 

Fig. 1 is a block diagram representing at a very general level the functional elements 
of a computer-based speech therapy system in accordance with the present invention. The 
system includes a processor 10 operating under control of software stored in memory 12. A 
microphone 14 converts input sounds into electrical signals that are digitized by A/D 
converter 16 and stored in memory 12 under control of processor 10. Stored data 
representing sounds are converted to analog signals by D/A converter 20 and output as 
sounds by speaker 18. The system of Fig. 1 may be implemented using a conventional 
personal computer (PC) having a sound card coupled to a microphone and speaker. Visual 
information is output to a user by a visual display 22, such as a CRT or LCD display. 
Preferably the system has a graphical user interface whereby control information to navigate 
the system is input from a user by a GUI input device 24 such as mouse, trackball or the like; 
but because aphasic patients may have difficulty operating such devices it is particularly 



preferred that input device 24 comprises a touch screen overlying visual display 22. The 
system hardware in Fig. 1 is generally conventional and may be embodied by a PC; software 
stored in memory 12 operates the system to conduct speech therapy in accordance with the 
present invention. Applicants have developed such a system in which the software included 
three principal functional elements as shown in Fig. 1: a speech recognizer 26, a natural 
language engine 28, and a PC application 30. Applicants used speech recognition software 
commercially available from Lernout & Hauspie Speech Products for speech recognizer 26. 
Applicants developed a natural language engine 28 for speech therapy using the Pundit 
natural language development environment of Unisys Corporation. Applicants also 
developed a PC application 30 to interface the natural language engine 28 to the PC system 
hardware and software. It is believed to be well within the ordinary skill in the art to develop 
specific natural language processing and interface software to implement such particular 
therapy systems as may be desired. 

Fig. 2 is a flow diagram illustrating the basic operation of a speech therapy system in 
accordance with the present invention. In step 40, the system displays a picture on the 
display 22 of Fig. 1 based upon data stored in memory 12. The pictures are selected or 
created so as to provide particular linguistic elements or structures that are desired to be 
tested or practised in a therapy session. In step 42, the system outputs a prompt to the user 
for a word to describe the picture, or an aspect of the picture. Aphasic patients may have 
difficulty reading a text prompt, and since the system is intended to exercise a patient's 
spoken language capabilities, preferably the prompt generated in step 42 is a computer- 
generated speech prompt that is output from speaker 18 based upon data stored in memory 
12. In order to provide accurate and natural speech prompts as well as spoken feedback to 
the patient, applicants prefer to generate them from digitized and stored human speech rather 
than synthesizing them. In step 44, the patient's speech response to the prompt is input, 
digitized, and stored. In step 46, speech recognition is performed upon the data representing 
the patient's speech response, to provide data representing the words comprising the 
response. In step 48, the data representing the words comprising the patient's speech 
response is subjected to natural language analysis to evaluate its correctness, i.e. did the 
patient correctly give the information about the picture that was requested by the prompt of 
step 42. In step 50, the system outputs feedback to the patient regarding the correctness of 
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the patient f s response. Preferably the feedback includes both spoken feedback telling the 
patient that the response was or was not correct and musical feedback to make using the 
system enjoyable for the patient, such as a fanfare for correct responses and a horn for 
incorrect responses. The system may then return to step 42 and generate a prompt for 
5 additional information about the same picture, or to step 40 and display a new picture. 

Fig. 3 is a more detailed flow diagram illustrating the operation of a preferred speech 
therapy system in accordance with the present invention in an example of a therapy exercise 
that may generate displays as illustrated in Figs. 4 and 5. In the example, the picture in Figs. 
4 and 5 shows a man 100 dressed as a sailor hitting a box-like object 102. In step 60, the 

10 system displays the picture in color. In step 62, the system outputs a prompt to the user for a 
word to describe an aspect of the picture; for instance, the system might generate the spoken 
prompt "Please look at this picture. What is a good verb to describe the action here?" If the 
user cannot think of an appropriate word, a pair of buttons 108, 1 10 is provided to assist the 
user with an auditory cue. In exercises to find appropriate verbs the buttons are designated 

15 with the letter "v"; pressing the the small v button 108 causes the system to output the initial 
consonant + vowel sound of the desired verb, and pressing the large V button 1 10 outputs the 
entire desired verb. In exercises to find appropriate prepositions the buttons might be 
designated with the letter "p". In response to the prompt, the user presses recording start 
button 104, vocalizes the response to the prompt, and presses recording stop button 106 at the 

20 end of the response. Preferably pressing the record start button 104 causes visual feedback to 
indicate that the system is recording, such as coloring the button. In step 64, the system 
inputs this speech response, speech recognizes and natural language analyzes it, as in steps 
44-48 of Fig. 3. The system is desirably programmed to accept certain synonyms as 
appropriate responses to the prompts; for instance, the system may accept "hit" and "punch" 

25 as appropriate verbs to describe the picture in Fig. 4. In step 66 the system outputs audio 
feedback indicating its determination of the correctness of the user's response. Preferably, 
the feedback consists of both a musical output and a speech output. Thus if the response was 
incorrect, in step 66 the system might play harsh or sad music and, if the user has not made a 
predetermined number of attempts to correctly respond to the prompt, as determined in step 

30 70, in step 72 the system might repeat the prompt: "Please try again. What is ? good verb to 
describe the action in this picture?" If the response was determined in step 64 to be 



acceptable, the appropriate audio feedback is output in step 66. For instance, if the user said 
"punch", a correct response to the prompt, the system might play a musical fanfare and then 
say "you're right, punch is a good word for this picture, but the verb we will be using is hit" 
in order to alert the user to words the system will use in further prompts or spoken feedback. 
5 In step 74, a determination is made as to whether there are further elements of the displayed 
picture to be prompted for. For instance, in addition to prompting for verbs, the therapy 
program might also provide practice in identifying sentence subjects or objects, or 
prepositions. If so, the system returns to step 62 and outputs a prompt for another element of 
the displayed picture. For instance, with the picture shown in Figs. 4 and 5, the system might 
10 generate a speech output of "What is the sailor hitting?" or "Who is hitting the box?" User 
cues are also provided for nouns; touching a displayed object causes a speech output of the 
name of the object. Once all elements of a picture that are to be prompted for individually 
O have been prompted and responded to, the system seeks a sentence describing the entire 

□ picture. To assist in giving feedback regarding partially correct responses, in step 76 the 

H"T"a 

q 15 picture is "decolorized" and rendered in black and white. In step 78, the sysxem outputs a 
W prompt to the user for a sentence describing the entire picture; for instance, the system might 

Ly generate the spoken prompt "now please try to say the whole sentence." The user's spoken 

q response to the prompt is again input, speech recognized, and natural language analyzed to 

=C determine its correctness in step 80. It is here that the inclusion of natural language 

IH 20 processing provides the system of the invention with the ability to autonomously conduct 
^ useful therapy involving fairly complex utterances. By analyzing the semantic content of the 

spoken response, the system can judge as correct a number of different utterances that are 
equally appropriate but have different phraseology. For instance, for the picture of Figs. 4 
and 5, by natural language processing of the recognized words in the patient's response, the 
25 system can judge the responses "The man hits the box", "The sailor is hitting the block", and 
"The box is being punched by the man" as equally correct. In step 82, audio and preferably 



responses, the system may loop to step 60 and display a new picture; pressing stop button 
1 12 will exit the therapy program. For incorrect responses, the system permits retries in step 
30 86. If the incorrect response is partially correct, preferably the feedback in step 82 and 
prompt in step 88 identify the correct parts of the response and prompt for the incorrect ones. 
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also visual feedback 




response is output. For correct 



Thus, if the patient f s response was "The sailor is hitting the squirrel", the system might 
provide visual feedback by colorizing the sailor to show that it was correctly identified and 
generate the spoken feedback and prompt "Yes, you're partially right, the sailor is hitting 
something. What is the sailor hitting?" 

Other buttons included in the interface include hourglass button 116 that shows how 
many minutes the patient has been working; parrot button 120 that replays the patient's last 
response; and repeat button 1 14 that replays the current system prompt. 

In accordance with the foregoing, a speech therapy system is provided that is 
extremely motivating: patients like being able to practice before turning on the speech 
recognizer (unlike a human, the computer does not wait impatiently while they struggle for 
words), and their speech has an immediate and obvious impact on the computer, which is, in 
a sense, empowering since aphasia represents a terrible loss of the ability to control one's 
environment through language. 

Variations on the systems disclosed herein and implementation of specific systems 
may no doubt be done by those skilled in the art without departing from the spirit and scope 
of the present invention. 



